Protein Science — Latest Matching Preprints

1

Computational Redesign of an Antifreeze Protein Using Deep Learning

Calia, C.; Altunc, A. J.; Eufemio, R. J.; Alvarado, B. O.; Huynh, J. D.; Oh, E.; Burkart, M.; Meister, K.; Paesani, F.

2026-06-24 biophysics 10.64898/2026.06.21.733612 medRxiv

Top 0.1%

38.9%

Show abstract

Antifreeze proteins (AFPs) found in various cold-adapted organisms inhibit ice growth and are of interest for applications in food products, cryopreservation, agriculture, and materials science. Although high-resolution structures are available for several AFPs, the amino acids required for full antifreeze activity remain incompletely defined, and the development of AFP variants with properties such as enhanced solubility, high expression yield, and improved thermostability may further facilitate applications. Here, we used the deep learning model ProteinMPNN to redesign the globular fish antifreeze protein AFPIII, keeping the previously reported ice-binding residues fixed. We readily obtained sequences confidently predicted to adopt AFPIIIs structure and we selected five designed variants for expression, all of which expressed efficiently in E. coli. Circular dichroism spectroscopy showed that two of these variants retained secondary structure elements consistent with AFPIII, whereas the other three exhibited structural differences. One design was predicted and experimentally confirmed to have increased thermostability. All five variants displayed measurable thermal hysteresis activity. However, none reached the activity of wild-type AFPIII, suggesting that maintaining the currently established set of ice-binding residues is not sufficient to fully preserve this AFPs function; other, unidentified residues can also impact its activity. Our findings highlight the value of deep learning-based protein design methods both for generating AFP variants with desirable properties and for uncovering gaps in existing knowledge of well-characterized AFPs.

2

Accurate protein stability prediction for small domains using mega-scale experiments

Cho, Y.; Tsuboyama, K.; Litberg, T. J.; Jung, M. D.; Obisesan, A.; Wang, Q.; Phoumyvong, C. M.; Thibeault, J.; Ovchinnikov, S.; Rocklin, G. J.

2026-05-20 biophysics 10.64898/2026.05.19.726285 medRxiv

Top 0.1%

33.9%

Show abstract

Predicting absolute protein folding stability is a long-standing challenge in biophysics, with broad applications in protein design and in understanding genetic variation and evolution. Physics-based simulations have shown limited success at predicting stability and are often computationally intractable, and machine learning methods have been constrained by the lack of sufficiently large experimental datasets. We recently introduced cDNA display proteolysis, a cell-free approach that can measure folding stability for nearly one million protein domains in parallel. Here, we applied this method to measure stability for 1.8 million diverse protein domains 60-80 amino acids in length primarily taken from the MGnify metagenomic database and spanning over 200,000 sequence families. Using this new "MGnify Stability dataset", we developed the predictive models SaProt{Delta}G and ESM3{Delta}G, which accurately predict absolute folding stability for small domains with root mean squared error of 0.8 kcal/mol over a 6 kcal/mol range (Spearman rank correlation of 0.88). These predictors show high accuracy at predicting effects of substitutions, insertions, and deletions, successfully identify global trends toward higher stability in thermophilic organisms, and improve discrimination of stable and unstable computationally designed proteins. Our results illustrate how megascale biophysical measurements can complement existing evolutionary and structural data to enable accurate absolute stability prediction for small domains.

3

Prediction-Guided Design of a More Developable FGF21 Construct

Bozkurt, C.; Nathanail, E.; Goteti, A.

2026-07-14 bioengineering 10.64898/2026.07.13.738140 medRxiv

Top 0.1%

31.6%

Show abstract

For structural-biology and protein-production pipelines, the hardest part of a difficult protein is not the biology -- it is obtaining a well-behaved sample for functional studies. Programs routinely stall at construct design, expression, and purification: deciding where to truncate, which tags to use, how to express, and how to purify so the protein survives concentration and handling. These decisions are still made largely by literature precedent and experimental experience, and they require trial-and-error before arriving at a functional construct for hard targets. We present a prospective, single-pair wet-lab case study testing whether an integrated computational platform can improve these decisions. For human fibroblast growth factor 21 (FGF21) -- a clinically important and stability-challenged metabolic hormone -- we compared two expression constructs produced side by side under the same experimental workflow, using two different design strategies: one designed by a scientist from the literature (reproducing the published core-domain construct, PDB 6M6E), and one designed by the Orbion platform -- an AI, prediction-guided protein-design system (orbion.life) -- which additionally generated the expression and purification protocols (executed scientist-in-the-loop). The platforms construct used an unconventional, longer C-terminal boundary not found in public sequence databases. Since the two constructs differ in more than one feature, we treat them as workflow-level designs throughout. The scientist construct gave a higher initial yield ([~]2.4 xmore protein recovered at affinity capture). The platform-designed construct, however, showed a more favourable downstream developability profile: it concentrated higher (1.4 vs 0.7 mg/mL) while remaining more monodisperse by dynamic light scattering (DLS). The scientist construct, in contrast, aggregated on concentration, so its initial-yield advantage did not survive: in the final concentrated sample the Orbion construct provided the more usable material for downstream studies. Computed for the mammalian host used, the platform had prospectively scored its own design higher (composite 68.7 vs 59.0 for the scientist-designed construct), and its predictions of yield, solubility, and disorder matched the wet-lab outcome. This is a single, deliberately scoped case study, not a population-level benchmark; the two constructs differ in more than one feature, and biological activity was not assayed. Alongside the bottlenecks of this approach discussed here, used as a decision aid, prediction-guided construct and protocol design has the potential to remove costly iteration cycles of protein production campaigns.

4

Prosculpt: Lowering the Barrier to Computational Protein Design

Olivieri, F.;Konstantinova, A.;Ribnikar, N.;Bizjak, N.;Žnidar, ?.;Abel, K.;Rajh, E.;Ljubetič, A.

2026-06-26 Synthetic Biology 10.64898/2026.06.25.732351 medRxiv

Top 0.1%

31.3%

Show abstract

Over the past decade, protein design has evolved from a specialized discipline into a broadly accessible approach for engineering and interrogating biological systems. Despite these advances, protein design continues to be a technically challenging task, often requiring knowledge of programming to be able to use and combine the different software packages. To address this challenge, we have developed Prosculpt, an easy-to-use protein design pipeline. Prosculpt integrates RFdiffusion for backbone generation, ProteinMPNN for sequence design and multiple structure-prediction platforms (AF2, AF3, Colabfold, Boltz2). Candidate designs are evaluated using customizable Rosetta-based scoring protocols. Each project is specified through a single configuration file, enabling users with minimal computational expertise to perform sophisticated protein design tasks without writing code, while also allowing advanced users to access the full capabilities of the underlying programs. Prosculpt supports a wide range of applications, including design of symmetric homo-oligomers, design of binders, motif scaffolding, partial diffusion and fixed-backbone sequence redesign. By combining these capabilities within a single, user-friendly platform, Prosculpt provides a practical entry point to modern protein design for both novice and expert users.

5

Protein Surface Site Determines the Evolutionary Accessibility of Allosteric Regulation

Dinan, J. C.; McCormick, J. W.; Soni, R.; Thompson, S.; Reynolds, K. A.

2026-07-03 biophysics 10.64898/2026.07.02.735819 medRxiv

Top 0.1%

26.9%

Show abstract

Domain recombination is a major source of new allosteric regulation in both evolved and engineered proteins. However, the sequence and structural features that govern where new allostery may emerge remain poorly understood. Here, we test the hypothesis that the evolutionary accessibility of allosteric regulation following domain insertion is constrained by local surface context, specifically association with pre-existing cooperative networks known as protein sectors. We began with two synthetic domain fusions wherein the Avena sativa light-oxygen-voltage (LOV2) domain was inserted into Escherichia coli dihydrofolate reductase (DHFR) at either a sector connected or non-sector connected surface. The insertion sites are only separated by five residues and both DHFR enzymes retain similar catalytic activity, yet the sector connected version exhibits a light-dependent allosteric phenotype, while the non-sector connected version does not. Using deep mutational scanning, we measured the effect of nearly all single point mutations on allostery in each chimera. The sector-connected DL121 was significantly more evolvable, possessing numerous allostery-tuning single mutants. In contrast, DL116 lacked statistically significant mutants that introduce allosteric regulation, suggesting the protein surface used by DL116 may be an evolutionary "dead end" for a regulatory phenotype. Surprisingly, DL116 did not show cooperative unfolding at temperatures up to 80 {degrees}C, suggesting that enhanced protein stability does not promote the evolvability of allosteric regulation as it does with other phenotypes. Together, our findings show that protein surface context influences the mutational pathways available for allosteric regulation, consistent with the view that sector-connected surface sites harbor a latent capacity for allostery while other locations are more evolutionarily inert.

6

Comparison of AI protein structure ensemble prediction tools

Otten, L.; Leung, J. M. G.; Chong, L. T.; Zuckerman, D. M.

2026-05-30 biophysics 10.64898/2026.05.29.728804 medRxiv

Top 0.1%

26.0%

Show abstract

Multiple AI prediction tools for protein structural ensembles have recently been released, building on the much heralded advances from AlphaFold, large language models, and other machine-learning approaches. Here we report on a comparison of several tools (BioEmu, AFSample2, ESMFlow) using a small test set of proteins, including three which exhibit well-studied structural transitions. Overall, while the AI platforms generate structurally diverse ensembles with overlapping regions, each tool produces clearly distinct conformational distributions. Thus, it is impossible that all the tools generate ensembles of high biophysical quality, analogous to a Boltzmann distribution. Experimental structures are often, but not always, covered by the ensembles in dimensionally reduced spaces. In cases where point mutations are known experimentally to cause large structural shifts, the AI tools exhibit either small or negligible shifts. Although our current analysis cannot evaluate the absolute quality of an ensemble, and hence cannot identify a best-performing AI tool, the results suggest users pursuing downstream applications such as protein engineering or drug design should interpret these ensembles with caution.

7

AlphaFlex: Ensembles of the human proteome representing disordered regions

Liu, Z. H.; Zhang, O.; De Castro, S.; Sun, K.; Ghafouri, H.; Attafi, O. A.; Fawzi, N. L.; Tosatto, S. C. E.; Monzon, A. M.; Moses, A. M.; Head-Gordon, T.; Forman-Kay, J. D.

2026-06-23 biochemistry 10.1101/2025.11.24.690279 medRxiv

Top 0.1%

21.9%

Show abstract

More than two thirds of proteins in the human proteome are predicted to contain intrinsically disordered regions (IDRs), which lack stable folded structure. IDRs are critical for biological regulation and organization, as targets for post-translational modifications, and as mediators of biomolecular condensates. To address the pressing need for better structural models enabling functional insight, we developed AlphaFlex to model fully atomistic conformer ensembles for proteins predicted to have IDRs, modeled in the context of AlphaFold folded domains and an implicit bilayer for transmembrane proteins. The AlphaFlex resource provides conformational ensembles of human proteins from the AlphaFold database with identified IDRs in the Protein Ensemble Database that is mirrored in UniProt. This transformative resource of AlphaFlex ensembles provides physically and biologically relevant full-length models for IDR proteins, including scaffold proteins, those with IDR:folded-domain interactions, regulatory and condensate proteins requiring exposed binding elements, conditionally folding IDRs, and transmembrane proteins containing IDRs.

8

The protein binding domains of staphylococcal protein A fold independently and form an N- to C-terminal gradient of increasing stability.

Hagarman, A.; Franch, W. R.; Oas, T. G.

2026-06-02 biophysics 10.64898/2026.05.31.729144 medRxiv

Top 0.1%

19.0%

Show abstract

Surface factors that contribute to the virulence of Staphylococcus aureus have become therapeutic targets in the treatment of illness associated with this bacterium. Staphylococcal protein A (SpA) is a well-known contributor to S. aureus toxicity and virulence, although relatively little is known about protein A and how its biological function has evolved. SpA is displayed on the surface of the bacterium and contains 5 nearly identical helical ({approx} 60 aa) domains that bind antibodies with high affinity (Kd {approx} 10 nM). The folding free energies of only domains E and B have been determined. In this study we used intrinsic fluorescence detected denaturation to measure the folding thermodynamics of each domain in isolation and in the native multidomain context using a construct that includes the N-terminal half of the mature protein (SpA-N). We also constructed a series of proteins with 1 to 5 repeats of B domain, linked exactly as the five domains of WT SpA are linked. We used nearest neighbor thermodynamic models to explicitly demonstrate that the domains in B domain repeat proteins fold independently. We also showed that the domains in SpA-N fold independently by comparing the folding free energies of domains in isolation and in their multidomain context. Previous dynamic NMR experiments detected highly flexible linkers between domains in 5B, suggesting that the domains of SpA are structurally independent, which is likely responsible for the lack of thermodynamic coupling. Our results also showed a steep increase in domain stability from the N-to C-terminus in SpA-N, from 0.97 {+/-} 0.05 to 5.57 {+/-} 0.28 kcal/mol. We hypothesize that this stability gradient is related to efficient secretion of protein A.

9

Artificial intelligence aided design of peptides with custom secondary structure motifs and reduced amino acid alphabets

Brown, S. M.; Cohen, A. B.; Dean, S. N.

2026-05-01 bioinformatics 10.64898/2026.04.29.721096 medRxiv

Top 0.1%

19.0%

Show abstract

Proteins are highly diverse functional polymers where the specific sequence of amino acids, selected from a standard genetically-encoded alphabet of twenty (C20), determines the structure and ultimately the function of the resulting folded protein. This standard alphabet has been identified to be non-randomly distributed in physicochemical properties crucial to both structure-formation and function, often referred to as coverage theory. While machine learning models have drastically improved protein structure prediction, protein design has yet to have similar development. Here we therefore bridge contemporary biological theory with recent advancements in artificial intelligence (AI) to develop and evaluate a generative AI protein design model, trained on hundreds of thousands of proteins within the RSCB PDB, for custom secondary structure motifs using reduced amino acid alphabets. Results indicate an overall success in designing novel proteins with desired secondary structure motifs for a broad range of amino acid alphabets. Interestingly this tool often captures the full three-dimensional tertiary structure of a target protein despite training only on physicochemical sequence space and DSSP secondary structure. The development of this model advances research across multiple disciplines, from general scientific AI/ML architecture development to protein design for biotechnology, astrobiology, and early-Earth evolutionary biology.

10

Allosteric Protein Chemical Shift Perturbations are Ubiquitous

Benavides, T. L.; Ramelot, T. A.; Montelione, G. T.

2026-05-08 bioinformatics 10.64898/2026.05.04.722792 medRxiv

Top 0.1%

18.8%

Show abstract

While allosteric protein function has been appreciated for decades, the ubiquity of conformational shifts, particularly those distant from the interaction interface, has not been broadly characterized. For example, ligand binding frequently triggers allosteric effects far from the interaction interface, yet the prevalence of these conformational shifts underpinning protein function remain poorly documented. We systematically assessed the generality of allosteric effects as monitored by NMR Chemical Shift Perturbations (CSPs) distant from the interaction interface. In a set of 139 protein-protein complexes, a striking 74% of all significant CSPs are non-local to the binding site. Notably, more than 35% of significant CSPs outside the binding site occur in residues for which the shortest receptor-ligand interatomic distance is more than 10 [A]. Every protein analyzed exhibits a significant fraction (> 8%) of CSPs distant from the binding site. This analysis across a large number of protein structures demonstrates and documents that structural plasticity is a ubiquitous and fundamental property of proteins. Significance StatementStudies of protein dynamics have had a profound impact on biology. Ruth Nussinov famously postulated that multiple protein conformations preexist in dynamic equilibrium, with interconversions that mediate function. While conformational flexibility has been characterized in many specific case studies, the extent to which structural plasticity can be considered a fundamental and ubiquitous property of proteins remains poorly documented. We address a central question: how common is protein structural plasticity? To do so, we compiled a database of protein-protein and protein-peptide complexes with NMR chemical shift data for both bound (holo) and unbound (apo) states. These data reveal the widespread prevalence of long-range structural perturbations induced by ligand binding, demonstrating that structural plasticity is a pervasive and fundamental property of proteins.

11

Systematic Characterization of Thermal Stability Assay Parameters and Application in Discovery of Peptide-Protein Interactions

Richards, D. M.; zhai, F.; Li, S.; Yu, Q.

2026-05-08 biochemistry 10.64898/2026.05.06.723354 medRxiv

Top 0.1%

18.8%

Show abstract

Thermal proteome profiling (TPP) and its higher-throughput derivative, the proteome integral solubility alteration (PISA) assay, measure changes in protein thermal stability upon ligand binding or other perturbations and have been widely adopted in drug discovery and biomedical research. Though the PISA workflow is straightforward, key parameters, including detergent concentration, methods for removing denatured aggregates, and temperature range selection, vary across studies and can markedly influence assay outcomes. Yet these factors have not been systematically evaluated, limiting rational experimental design and data interpretation. Here, through a combined use of TPP, PISA, tandem mass tag (TMT)-based multiplexing, and computational simulation, we systematically characterize these parameters based on the melting behavior of [~]9,000 proteins. We find that reducing detergent concentration elevates apparent Tm by 1.5-2{degrees}C proteome-wide, and aggregate removal by filtration versus centrifugation further alters measurements. We leverage these observations to optimize PISA then apply the optimized conditions to identify the aminopeptidase NPEPPS as a previously uncharacterized binding partner of angiotensin II, a key vasoactive peptide hormone in blood pressure regulation. Together, this work provides a general framework for assay design and data interpretation, and extends the utility of PISA beyond small molecules to dissecting peptide-protein interactions, an increasingly important modality in drug discovery.

12

Biophysical and enzymatic comparison of Bacillus safensis and Bacillus subtilis malate dehydrogenase (MDH) enzymes

Zafiropoulo, H. R.; Thomas, J. E.; Cortez, N. R.; Apostol, K.; de Sa, A.; Khosravi, R.; Moore, L.; Berndsen, C. E.; Bibel, B.

2026-05-14 biochemistry 10.64898/2026.05.13.723581 medRxiv

Top 0.1%

18.7%

Show abstract

Species of Bacillus bacteria including Bacillus safensis and Bacillus subtilis are finding increasing uses in biotechnology and bioremediation, thanks in part to their metabolic robustness. Malate dehydrogenase (MDH) is at the heart of central metabolism and thus a better understanding of Bacillus MDH proteins could aid in the optimization of these applications. MDH of Bacillus spp. belong to the lactate dehydrogenase (LDH)-like class of MDHs, otherwise known as the MDH3 class. Despite wide prevalence in nature among prokaryotes and archaea, this typically homotetrameric class is understudied compared to the MDH1 and MDH2 classes found in eukaryotes. We therefore recombinantly expressed and purified MDH proteins from two societally relevant Bacillus spp.-B. safensis and B. subtilis-and characterized them biophysically (via Size Exclusion Chromatography-Small Angle X-ray Scattering (SEC-SAXS) and Differential Scanning Fluorimetry (DSF)) and enzymatically (via spectroscopic activity assays). As expected based on their high sequence identity, the two MDH orthologs had similar properties in most regards, including a tetrameric structure and high susceptibility to substrate inhibition. However, we uncovered differences in conditional thermal stability, in addition to subtle differences in enzymatic activity that offer insight into the workings of LDH-like MDH. Summary statementMalate dehydrogenase (MDH) is a fundamental metabolic enzyme, from microbes to mammals, yet comparably little is known about microbial MDH, especially MDH of the tetrameric MDH3 class. We compare the biophysical and enzymatic properties of two such enzymes from the societally relevant bacterial species Bacillus subtilis and Bacillus safensis, offering useful insight with potential biotechnological implications.

13

Molecular Insights into Solvent-Mediated Stabilization and Aggregate Suppression during Refolding of Recombinant Leucyl Aminopeptidase

DAS, D.; Kaushik, J. K.

2026-06-14 biochemistry 10.64898/2026.06.12.732004 medRxiv

Top 0.1%

18.6%

Show abstract

Production of recombinant proteins frequently yields inclusion bodies that must undergo refolding to yield active protein. Here, we optimized the refolding conditions for the recombinant leucyl aminopeptidase (rPepL) from Lactocaseibacillus casei expressed in inclusion bodies from E. coli. Several chemical additives were assessed for how well they facilitated an increase in refolding efficiency. The best, 0.5 M L-arginine, yielded 50.8% refolding. The addition of stabilizers, such as sucrose and glycerol, with L-arginine further increased yields to 85%. Urea at lower concentrations (0.25-0.5 M) also facilitated an increase in the refolding yield when co-added with L-arginine, whereas guanidinium chloride inhibited it. Sugars and polyols exhibited dose-dependent effects, with ranges for optima also defined. Fluorescence spectroscopy verified enhancements in the refolding under the optimized conditions. Molecular dynamics simulation under mixed solvent conditions provided atomic insights about stabilizing interactions that are likely to facilitate increased refolding. The results show that a series of aggregation suppressors and protein stabilizers can, in a collaborative way, increase the refolding efficiency for the recombinant proteins from the inclusion bodies. The protocol with the optimization using the additives L-arginine, sucrose, and glycerol is an efficient method for the production of active rPepL. This article outlines the best refolding method to recover recombinant leucyl aminopeptidase from inclusion bodies of E. coli using L-arginine combined with sucrose and glycerol. The combined experimental observations and computational simulations elucidate the molecular process of additive-induced stabilization, which elucidates how aggregation inhibition and hydrogen-bonded stabilization act synergistically. The results presented herein answer both mechanistic understanding and experimental guidance for improving protein refolding.

14

Benchmarking AI Protein Structure Predictors Reveals a Persistent Bias in Multi-State Proteins

Ye, M.; Wang, Y.-H.; Brogi, M.; Parks, J. M.; Kuo, K. M.; Gumbart, J. C.

2026-07-11 biophysics 10.64898/2026.07.10.737860 medRxiv

Top 0.1%

18.5%

Show abstract

Protein structure predictors achieve high single-state accuracy, but it remains unclear whether they can recover functionally relevant conformational ensembles or account for the presence of ligands and/or binding partners. Here, we benchmark AlphaFold3, Boltz-2, Chai-1, and BioEmu on four canonical multi-state proteins (Pf-MATE, LAO, SecA, and {beta}2AR), quantifying state bias and sampling breadth against experimental reference structures. Models frequently default to a dominant state represented in the PDB; small-molecule ligands have weak or inconsistent effects, while large protein partners drive clear conformational switching between states. Multiple sequence alignment (MSA)-based approaches (AF-Cluster and random subsampling) recapitulate similar biases, indicating that this behavior is not unique to newer architectures. These results underscore current limitations for multi-state protein structure prediction and structure-guided ligand discovery. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/737860v1_ufig1.gif" ALT="Figure 1"> View larger version (12K): org.highwire.dtl.DTLVardef@3bf389org.highwire.dtl.DTLVardef@1f1c436org.highwire.dtl.DTLVardef@188ea8aorg.highwire.dtl.DTLVardef@1de236e_HPS_FORMAT_FIGEXP M_FIG C_FIG

15

The turn less taken: Investigating patterns in β-turn dynamics using large-scale molecular dynamics data

Zhang, S.; Maddipatla, S. A.; Vedula, S.; Marx, A.; Bronstein, A. M.

2026-05-08 biochemistry 10.64898/2026.05.07.721674 medRxiv

Top 0.1%

18.4%

Show abstract

{beta}-turns are among the most common structural motifs in proteins, yet their conformational dynamics and sequence determinants remain incompletely understood. Here we present a data-driven classification and dynamic analysis of {beta}-turn conformations using large-scale molecular dynamics trajectories from the mdCATH database. Clustering of backbone dihedral angles using a cross-bond Ramachandran representation identifies six {beta}-turn types, including a previously uncharacterized hybrid I/I' cluster that combines geometric features of canonical type I and I' conformations. Time-resolved analysis indicates that this hybrid state acts as a transient intermediate state of {beta}-turns. Transitions observed in molecular dynamics simulations closely match NMR ensembles and altlocs detected in X-ray crystal structures, with the most dominant exchanges occurring between type I and II, and between type I' and II' turns. Sequence analysis shows that each turn type exhibits characteristic amino acid preferences at the central residues (i + 1 and i + 2). Within these overall preferences, specific residue pairs display distinct biases toward static or dynamic behavior. Targeted in silico substitutions that interchange dynamic- and static-enriched residue pairs shift the conformational behavior of turns accordingly, providing direct support for these sequence-dynamics relationships. Analysis of flanking secondary-structure environments reveals that structural context further modulates turn flexibility, with strand- and coil-associated turns exhibiting higher dynamic propensity than helix-associated turns. Together, these results reveal how sequence composition and structural context jointly shape the conformational landscape of {beta}-turns.

16

Amylo-Pipe: an integrated web server for mechanistic and kinetic prediction of protein and peptide aggregation

Rawat, P.; Ramakrishnan, P.; Cardente, N.; Kumar, S.; Greiff, V.; Gromiha, M. M.

2026-06-11 bioinformatics 10.64898/2026.06.09.731090 medRxiv

Top 0.1%

18.2%

Show abstract

Protein aggregation is central to amyloid-related disorders and remains a major developability challenge for protein therapeutics. Over the past two decades, significant advances have been made to predict aggregation-prone regions (APRs) and estimate aggregation propensity in proteins and peptides. In contrast, the prediction of aggregation kinetics has received relatively less attention due to the limited availability and heterogeneity of experimental data. Consequently, aggregation propensities from APR prediction algorithms were widely accepted as a means to predict relative changes in the aggregation kinetics of proteins and mutants. Previous studies have demonstrated, using large-scale datasets, that aggregation propensity shows a weak or inconsistent correlation with aggregation kinetics. In the present study, we have integrated complementary state-of-the-art mechanistic and kinetic prediction tools for protein aggregation into a unified, user-friendly web framework entitled "Amylo-Pipe". Amylo-Pipe also implements practical features that are especially useful for protein engineering, such as gatekeeper-residue mutational scanning to support the design of aggregation-resistant variants. By consolidating multiple prediction tasks in a single interface, Amylo-Pipe enables a more comprehensive assessment of aggregation behavior than APR-only workflows. The web server is freely accessible at: https://web.iitm.ac.in/bioinfo2/amylopipe/.

17

Phosphorylation Mimicking Mutations Cause TDP-43 to Adopt Different Fibril Conformations

Fonda, B. D.; Murray, D. T.

2026-05-17 biophysics 10.64898/2026.05.14.725298 medRxiv

Top 0.1%

16.7%

Show abstract

The Tar-DNA Binding Protein-43 C-terminal region, TDP43LC, has been previously shown to form amyloid-like fibrils with distinct folds in ALS and FTD. In both diseases, proteinaceous inclusions contain TDP43 C-terminal protein fragments as well as phosphorylated TDP43. Here, we use solution NMR to show that soluble phosphomimetic TDP43LC, P-TDP43LC, is structurally similar to wild-type TDP43LC. Disperse P-TDP43LC, like wild-type protein, contains a central helical region flanked by long disordered regions. Despite this similarity, our turbidity measurements, imaging, and kinetic assays show that P-TDP43LC has different aggregation behavior than wild-type protein. Using solid state NMR measurements we find that that phosphomimetic mutations alter the wild-type fibril conformation. Electrostatic repulsion from negatively charged sidechains, despite having little effect on the soluble proteins structure, perturbs amyloid-like fibril formation and selects for a different conformation in vitro. These results shed light on the structural role of TDP43LC phosphorylation in fibril formation in disease. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/725298v1_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@1c63aforg.highwire.dtl.DTLVardef@1d48ed6org.highwire.dtl.DTLVardef@1ed8fd3org.highwire.dtl.DTLVardef@17d67a8_HPS_FORMAT_FIGEXP M_FIG C_FIG SynopsisPhosphomimetic mutations at ALS and FTD neurodegeneration-associated sites in an amyloid forming protein perturbs the aggregated structure compared to wild-type protein.

18

TorchRef: An open-source PyTorch Framework for Crystallographic Refinement

Weinert, T.; Standfuss, J.; Seidel, H. P.

2026-05-16 bioinformatics 10.64898/2026.05.13.724821 medRxiv

Top 0.1%

15.4%

Show abstract

Macromolecular crystallographic refinement underpins structural biology, yet existing software packages often lack accessible, modular codebases amenable to rapid method development. Here, we introduce TorchRef, a PyTorch-based crystallographic refinement framework that exposes all refinable parameters, atomic coordinates, displacement parameters, occupancies, and scale factors to automatic differentiation. The framework implements FFT-based structure-factor calculations, the French-Wilson treatment of intensities, bulk-solvent modeling with established mask parameters, and stereochemical restraints from the CCP4 Monomer Library. A modular target architecture allows loss functions to be combined, weighted, and extended independently of the core refinement machinery. Validation against 1,000 PDB structures demonstrates that TorchRef-based refinement reproduces a median R-free within 1% of Phenix while maintaining comparable model quality. Structure factor calculation in TorchRef scales readily across multiple CPU cores and is over 100 times faster on modern GPUs than CCTBX. To showcase how modern methods like time-resolved crystallography can benefit from the flexibility that TorchRef provides, we implemented direct refinement of a typical time-resolved model against amplitude differences, a use case currently not explored by classic refinement programs. TorchRef is released under the MIT license with full API documentation and tutorials, providing an accessible platform for developing and testing new crystallographic refinement protocols. SynopsisTorchRef is an open-source PyTorch-based crystallographic refinement framework that exposes all refinable parameters to automatic differentiation, delivers GPU-accelerated structure-factor evaluation more than 100x faster than CCTBX, and enables new workflows, such as direct refinement against amplitude differences in time-resolved crystallography.

19

Hashi: Bridging Statistical Model Derived 1D Microstate Encodings and Protein 3D Structural Ensembles

Naganathan, A. N.; Madhan, H.

2026-06-02 biophysics 10.64898/2026.06.01.729173 medRxiv

Top 0.1%

15.3%

Show abstract

The functioning of proteins is intimately linked to the conformational states they sample within the native ensemble. Generating ensembles from a single static structure is therefore a research domain receiving considerable attention. In this application note, we introduce Hashi, a pipeline to rapidly generate realistic structural ensembles from the outputs of the structure-based Wako-Saito-Munoz Eaton (WSME) statistical mechanical model of protein folding. This approach relies on integrating the block WSME model outputs - strings of zeros and ones describing the conformational status of every residue over thousands or millions of microstates each assigned a statistical weight derived from physically grounded energy-entropy terms, and free energy profiles - with the RANCH module of the EOM (ensemble optimization method) from the ATSAS software suite, providing three-dimensional views of the structural ensembles within the model framework. It is applicable to a variety of single-chain monomeric systems with lengths ranging from 30 to 500 residues, including globular and repeat proteins. The generated structural ensembles can also be rank ordered according to their free energies within a given macrostate or a range of reaction coordinate values. Since the statistical weights of the WSME model microstates can be reweighted or calibrated with experiments, the ensembles shed light on not just the folding mechanism but also on the structural excursions that determine function and opening of otherwise buried binding pockets.

20

Folding the unfoldable 2: using AlphaFold and ESMFold to explore spurious proteins

Orr, A. K.; Bateman, A.

2026-06-10 bioinformatics 10.64898/2026.06.09.728211 medRxiv

Top 0.1%

15.0%

Show abstract

MotivationSpurious protein sequences, resulting from gene prediction errors, theoretically should not yield folded structures. AlphaFold2 was previously shown to predict short spurious sequences with high pLDDT scores and was therefore unlikely to distinguish between real proteins and spurious proteins which are usually short. We evaluate whether newer structure prediction methods (ESMFold and AlphaFold3) similarly predict short sequences with high pLDDT or if they better discriminate between spurious and real proteins. ResultsAll three structure prediction methods (ESMFold, AlphaFold2, and AlphaFold3) predict short spurious sequences from AntiFam with unexpectedly high pLDDT scores, however the discrimination between spurious and real proteins improves beyond 100 amino acids. By analysing sequences with disparate pTM and pLDDT scores, we identified two likely spurious shadow ORFs in Swiss-Prot and one potentially non-spurious AntiFam entry. Using the structure prediction scores, we developed a Gaussian Process Model and evaluated its performance on AlphaFold DB, identifying potential spurious proteins at scale. While limited on its own, this model can increase confidence in spurious protein identification when combined with other methods. AvailabilityStructure predictions are available at https://doi.org/10.5281/zenodo.18390113. Model implementation and figure generation code are available at https://github.com/0rra/fold_unfold2.